{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 第8章 提升方法"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "1．提升方法是将弱学习算法提升为强学习算法的统计学习方法。在分类学习中，提升方法通过反复修改训练数据的权值分布，构建一系列基本分类器（弱分类器），并将这些基本分类器线性组合，构成一个强分类器。代表性的提升方法是AdaBoost算法。\n",
    "\n",
    "AdaBoost模型是弱分类器的线性组合：\n",
    "\n",
    "$$f(x)=\\sum_{m=1}^{M} \\alpha_{m} G_{m}(x)$$\n",
    "\n",
    "2．AdaBoost算法的特点是通过迭代每次学习一个基本分类器。每次迭代中，提高那些被前一轮分类器错误分类数据的权值，而降低那些被正确分类的数据的权值。最后，AdaBoost将基本分类器的线性组合作为强分类器，其中给分类误差率小的基本分类器以大的权值，给分类误差率大的基本分类器以小的权值。\n",
    "\n",
    "3．AdaBoost的训练误差分析表明，AdaBoost的每次迭代可以减少它在训练数据集上的分类误差率，这说明了它作为提升方法的有效性。\n",
    "\n",
    "4．AdaBoost算法的一个解释是该算法实际是前向分步算法的一个实现。在这个方法里，模型是加法模型，损失函数是指数损失，算法是前向分步算法。\n",
    "每一步中极小化损失函数\n",
    "\n",
    "$$\\left(\\beta_{m}, \\gamma_{m}\\right)=\\arg \\min _{\\beta, \\gamma} \\sum_{i=1}^{N} L\\left(y_{i}, f_{m-1}\\left(x_{i}\\right)+\\beta b\\left(x_{i} ; \\gamma\\right)\\right)$$\n",
    "\n",
    "得 到 参 数$\\beta_{m}, \\gamma_{m}$。\n",
    "\n",
    "5．提升树是以分类树或回归树为基本分类器的提升方法。提升树被认为是统计学习中最有效的方法之一。\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "# Boost\n",
    "\n",
    "“装袋”（bagging）和“提升”（boost）是构建组合模型的两种最主要的方法，所谓的组合模型是由多个基本模型构成的模型，组合模型的预测效果往往比任意一个基本模型的效果都要好。\n",
    "\n",
    "- 装袋：每个基本模型由从总体样本中随机抽样得到的不同数据集进行训练得到，通过重抽样得到不同训练数据集的过程称为装袋。\n",
    "\n",
    "- 提升：每个基本模型训练时的数据集采用不同权重，针对上一个基本模型分类错误的样本增加权重，使得新的模型重点关注误分类样本\n",
    "\n",
    "### AdaBoost\n",
    "\n",
    "AdaBoost是AdaptiveBoost的缩写，表明该算法是具有适应性的提升算法。\n",
    "\n",
    "算法的步骤如下：\n",
    "\n",
    "1）给每个训练样本（$x_{1},x_{2},….,x_{N}$）分配权重，初始权重$w_{1}$均为1/N。\n",
    "\n",
    "2）针对带有权值的样本进行训练，得到模型$G_m$（初始模型为G1）。\n",
    "\n",
    "3）计算模型$G_m$的误分率$e_m=\\sum_{i=1}^Nw_iI(y_i\\not= G_m(x_i))$\n",
    "\n",
    "4）计算模型$G_m$的系数$\\alpha_m=0.5\\log[(1-e_m)/e_m]$\n",
    "\n",
    "5）根据误分率e和当前权重向量$w_m$更新权重向量$w_{m+1}$。\n",
    "\n",
    "6）计算组合模型$f(x)=\\sum_{m=1}^M\\alpha_mG_m(x_i)$的误分率。\n",
    "\n",
    "7）当组合模型的误分率或迭代次数低于一定阈值，停止迭代；否则，回到步骤2）"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import pandas as pd\n",
    "from sklearn.datasets import load_iris\n",
    "from sklearn.model_selection  import train_test_split\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# data\n",
    "def create_data():\n",
    "    iris = load_iris()\n",
    "    df = pd.DataFrame(iris.data, columns=iris.feature_names)\n",
    "    df['label'] = iris.target\n",
    "    df.columns = ['sepal length', 'sepal width', 'petal length', 'petal width', 'label']\n",
    "    data = np.array(df.iloc[:100, [0, 1, -1]])\n",
    "    for i in range(len(data)):\n",
    "        if data[i,-1] == 0:\n",
    "            data[i,-1] = -1\n",
    "    # print(data)\n",
    "    return data[:,:2], data[:,-1]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "X, y = create_data()\n",
    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<matplotlib.legend.Legend at 0x1ed9f4080c8>"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAXQAAAD5CAYAAAA3Os7hAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAZ80lEQVR4nO3df4xdZZ3H8fd3h1k7KjIBxlVm6hbFNAqtVkaQdENccbdaa20QsURWq6zsGlwwuBgxBrUxKYZEXSTR8CMrClvsVqzA8mNVbPyxUjMFbNdWIiraGdhlKLbIWrSM3/3j3mmnd+7M3Oeee+55nmc+r2Qyc8995vT7nINfz5zzOeeauyMiIun7s6oLEBGRzlBDFxHJhBq6iEgm1NBFRDKhhi4ikgk1dBGRTBzV6kAz6wFGgDF3X9Xw3jrgKmCsvugad79+tvUdf/zxvmjRoqBiRUTmu+3btz/h7gPN3mu5oQOXALuBF8zw/tfc/YOtrmzRokWMjIwE/PMiImJmv57pvZZOuZjZEPAWYNajbhERqU6r59A/D3wE+NMsY95uZjvMbLOZLWw2wMwuNLMRMxsZHx8PrVVERGYxZ0M3s1XA4+6+fZZhtwOL3H0p8G3gxmaD3P1adx929+GBgaangEREpE2tnENfDqw2s5XAAuAFZnaTu58/OcDd904Zfx3wmc6WKSLSOQcPHmR0dJRnnnmm6lJmtGDBAoaGhujt7W35d+Zs6O5+OXA5gJm9Hvjnqc28vvzF7v5Y/eVqahdPRUSiNDo6ytFHH82iRYsws6rLmcbd2bt3L6Ojo5x44okt/17bOXQzW29mq+svLzazn5rZT4CLgXXtrldEpGzPPPMMxx13XJTNHMDMOO6444L/ggiJLeLuW4Gt9Z+vmLL80FG8SG62PDDGVfc8xKP7DnBCfx+XrVjMmmWDVZclBcXazCe1U19QQxeZb7Y8MMblt+7kwMEJAMb2HeDyW3cCqKlLdHTrv8gsrrrnoUPNfNKBgxNcdc9DFVUkubj77rtZvHgxJ510EldeeWVH1qmGLjKLR/cdCFou0oqJiQkuuugi7rrrLnbt2sXGjRvZtWtX4fXqlIvILE7o72OsSfM+ob+vgmqkKp2+jvLjH/+Yk046iZe+9KUArF27lm9+85u88pWvLFSnjtBFZnHZisX09fYcsayvt4fLViyuqCLptsnrKGP7DuAcvo6y5YGxOX93JmNjYyxcePiG+qGhIcbG2l/fJDV0kVmsWTbIhrOXMNjfhwGD/X1sOHuJLojOI2VcR3H3acs6kbrRKReROaxZNqgGPo+VcR1laGiIPXv2HHo9OjrKCSec0Pb6JukIXURkFjNdLylyHeW1r30tP//5z/nVr37FH//4R2655RZWr1499y/OQQ1dRGQWZVxHOeqoo7jmmmtYsWIFr3jFKzj33HM5+eSTi5aqUy4iIrOZPN3W6buFV65cycqVKztR4iFq6CIic0jlOopOuYiIZEINXUQkE2roIiKZUEMXEcmEGrqISCbU0CUbWx4YY/mV93LiR/+D5VfeW+hZGyJle9/73scLX/hCTjnllI6tUw1dslDGA5REyrRu3Truvvvujq5TDV2yoA+ikFLt2ASfOwU+2V/7vmNT4VWeeeaZHHvssR0o7jDdWCRZ0AdRSGl2bILbL4aD9f+W9u+pvQZYem51dTWhI3TJQhkPUBIB4DvrDzfzSQcP1JZHRg1dsqAPopDS7B8NW14hnXKRLJT1ACURjhmqnWZptjwyauiSjVQeoCSJOeuKI8+hA/T21ZYXcN5557F161aeeOIJhoaG+NSnPsUFF1xQaJ1q6FJYpz9AVyQqkxc+v7O+dprlmKFaMy94QXTjxo0dKO5IauhSyGT+ezIyOJn/BtTUJR9Lz40u0dKMLopKIcp/i8RDDV0KUf5bUuXuVZcwq3bqU0OXQpT/lhQtWLCAvXv3RtvU3Z29e/eyYMGCoN/TOXQp5LIVi484hw7Kf0v8hoaGGB0dZXx8vOpSZrRgwQKGhsKikWroUojy35Ki3t5eTjzxxKrL6Dg1dClM+W+ROLTc0M2sBxgBxtx9VcN7zwG+ApwK7AXe6e6PdLBOkSQoky9VCrkoegmwe4b3LgB+6+4nAZ8DPlO0MJHU6JnsUrWWGrqZDQFvAa6fYcjbgBvrP28GzjIzK16eSDqUyZeqtXqE/nngI8CfZnh/ENgD4O7PAvuB4xoHmdmFZjZiZiMxX10WaYcy+VK1ORu6ma0CHnf37bMNa7JsWsDT3a9192F3Hx4YGAgoUyR+yuRL1Vo5Ql8OrDazR4BbgDeY2U0NY0aBhQBmdhRwDPBkB+sUiZ6eyS5Vm7Ohu/vl7j7k7ouAtcC97n5+w7DbgPfUfz6nPibOW7BESrJm2SAbzl7CYH8fBgz297Hh7CVKuUjXtJ1DN7P1wIi73wbcAHzVzB6mdmS+tkP1iSRFmXypUlBDd/etwNb6z1dMWf4M8I5OFiby8S072bhtDxPu9Jhx3ukL+fSaJVWXJRIt3SkqUfr4lp3cdN9vDr2ecD/0Wk1dpDk9bVGitHFbk89wnGW5iKihS6QmZrimPtNyEVFDl0j1zHCj8UzLRUQNXSJ13ukLg5aLiC6KSqQmL3wq5SLSOqvq/p/h4WEfGRmp5N8WEUmVmW139+Fm7+kIXZp613U/4oe/OPz0huUvO5ab339GhRVVR884l1ToHLpM09jMAX74iyd513U/qqii6ugZ55ISNXSZprGZz7U8Z3rGuaREDV1kFnrGuaREDV1kFnrGuaREDV2mWf6yY4OW50zPOJeUqKHLNDe//4xpzXu+plz0jHNJiXLoIiIJUQ5dgpWVvQ5Zr/LfImHU0GWayez1ZFxvMnsNFGqoIestqwaRnOkcukxTVvY6ZL3Kf4uEU0OXacrKXoesV/lvkXBq6DJNWdnrkPUq/y0STg1dpikrex2yXuW/RcLpoqhMM3nRsdMJk5D1llWDSM6UQxcRSYhy6CWIISMdWkMMNYtIedTQ2xBDRjq0hhhqFpFy6aJoG2LISIfWEEPNIlIuNfQ2xJCRDq0hhppFpFxq6G2IISMdWkMMNYtIudTQ2xBDRjq0hhhqFpFy6aJoG2LISIfWEEPNIlIu5dBFRBJSKIduZguA7wHPqY/f7O6faBizDrgKGKsvusbdry9StHTex7fsZOO2PUy402PGeacv5NNrlhQeG0u+PZY6RKrSyimXPwBvcPenzawX+IGZ3eXu9zWM+5q7f7DzJUonfHzLTm667zeHXk+4H3rd2KhDxsaSb4+lDpEqzXlR1Guerr/srX9Vc55G2rZx256Wl4eMjSXfHksdIlVqKeViZj1m9iDwOPAtd9/WZNjbzWyHmW02s4UzrOdCMxsxs5Hx8fECZUuoiRmulTRbHjI2lnx7LHWIVKmlhu7uE+7+amAIOM3MTmkYcjuwyN2XAt8GbpxhPde6+7C7Dw8MDBSpWwL1mLW8PGRsLPn2WOoQqVJQDt3d9wFbgTc1LN/r7n+ov7wOOLUj1UnHnHd60z+ami4PGRtLvj2WOkSqNGdDN7MBM+uv/9wHvBH4WcOYF095uRrY3ckipbhPr1nC+a97yaGj7B4zzn/dS5omV0LGrlk2yIazlzDY34cBg/19bDh7SdcvRMZSh0iV5syhm9lSaqdQeqj9H8Amd19vZuuBEXe/zcw2UGvkzwJPAh9w95/NuFKUQxcRacdsOXTdWNSmsjLPIfnvMtcdMr8Ut0VydmyC76yH/aNwzBCcdQUsPbfqqqQC+oCLDisr8xyS/y5z3SHzS3FbJGfHJrj9YjhYT+zs31N7DWrqcgQ9nKsNZWWeQ/LfZa47ZH4pbovkfGf94WY+6eCB2nKRKdTQ21BW5jkk/13mukPml+K2SM7+0bDlMm+pobehrMxzSP67zHWHzC/FbZGcY4bClsu8pYbehrIyzyH57zLXHTK/FLdFcs66Anob/g+yt6+2XGQKXRRtQ1nPFp+82FdGsiNk3SHzS3FbJGfywqdSLjIHxRZFRBKi2KIAcWTLJXHKw0dNDX2eiCFbLolTHj56uig6T8SQLZfEKQ8fPTX0eSKGbLkkTnn46KmhzxMxZMslccrDR08NfZ6IIVsuiVMePnq6KDpPxJAtl8QpDx895dBFRBIyr3PoZeWpQ9Yby3O9lS2PTO6Z7tznF6JL2yLrhl5WnjpkvbE811vZ8sjknunOfX4hurgtsr4oWlaeOmS9sTzXW9nyyOSe6c59fiG6uC2ybuhl5alD1hvLc72VLY9M7pnu3OcXoovbIuuGXlaeOmS9sTzXW9nyyOSe6c59fiG6uC2ybuhl5alD1hvLc72VLY9M7pnu3OcXoovbIuuLomXlqUPWG8tzvZUtj0zume7c5xeii9tCOXQRkYTM6xx6WZRvF0nEHZfC9i+DT4D1wKnrYNVni683wpy9GnoblG8XScQdl8LIDYdf+8Th10WaeqQ5+6wvipZF+XaRRGz/ctjyVkWas1dDb4Py7SKJ8Imw5a2KNGevht4G5dtFEmE9YctbFWnOXg29Dcq3iyTi1HVhy1sVac5eF0XboHy7SCImL3x2OuUSac5eOXQRkYQUyqGb2QLge8Bz6uM3u/snGsY8B/gKcCqwF3inuz9SsO6mQvPfqT0DPCRbnvu2KDXnG5JNLquOMucXYUa6Y0LnlvO2aNDKKZc/AG9w96fNrBf4gZnd5e73TRlzAfBbdz/JzNYCnwHe2eliQ/PfqT0DPCRbnvu2KDXnG5JNLquOMucXaUa6I0LnlvO2aGLOi6Je83T9ZW/9q/E8zduAG+s/bwbOMut83CI0/53aM8BDsuW5b4tSc74h2eSy6ihzfpFmpDsidG45b4smWkq5mFmPmT0IPA58y923NQwZBPYAuPuzwH7guCbrudDMRsxsZHx8PLjY0Px3as8AD8mW574tSs35hmSTy6qjzPlFmpHuiNC55bwtmmipobv7hLu/GhgCTjOzUxqGNDsan9aF3P1adx929+GBgYHgYkPz36k9AzwkW577tig15xuSTS6rjjLnF2lGuiNC55bztmgiKIfu7vuArcCbGt4aBRYCmNlRwDHAkx2o7wih+e/UngEeki3PfVuUmvMNySaXVUeZ84s0I90RoXPLeVs00UrKZQA46O77zKwPeCO1i55T3Qa8B/gRcA5wr5eQhwzNf6f2DPCQbHnu26LUnG9INrmsOsqcX6QZ6Y4InVvO26KJOXPoZraU2gXPHmpH9Jvcfb2ZrQdG3P22erTxq8Ayakfma939l7OtVzl0EZFwhXLo7r6DWqNuXH7FlJ+fAd5RpEgRESkm+1v/k7uZRroj5GaTGG5MKfNmmtRunIphf0Qq64ae3M000h0hN5vEcGNKmTfTpHbjVAz7I2JZP20xuZtppDtCbjaJ4caUMm+mSe3GqRj2R8SybujJ3Uwj3RFys0kMN6aUeTNNajdOxbA/IpZ1Q0/uZhrpjpCbTWK4MaXMm2lSu3Eqhv0RsawbenI300h3hNxsEsONKWXeTJPajVMx7I+IZd3Q1ywbZMPZSxjs78OAwf4+Npy9RBdE57ul58Jbr4ZjFgJW+/7Wq5tfVAsZG0O9oePLml9q682EPuBCRCQhhW4sEpn3Qj4MIxap1RxLtjyWOtqkhi4ym5APw4hFajXHki2PpY4Csj6HLlJYyIdhxCK1mmPJlsdSRwFq6CKzCfkwjFikVnMs2fJY6ihADV1kNiEfhhGL1GqOJVseSx0FqKGLzCbkwzBikVrNsWTLY6mjADV0kdms+iwMX3D46NZ6aq9jvLg4KbWaY8mWx1JHAcqhi4gkRDl0KVeK2d2yai4r/53iNpauU0OXYlLM7pZVc1n57xS3sVRC59ClmBSzu2XVXFb+O8VtLJVQQ5diUszullVzWfnvFLexVEINXYpJMbtbVs1l5b9T3MZSCTV0KSbF7G5ZNZeV/05xG0sl1NClmBSzu2XVXFb+O8VtLJVQDl1EJCGz5dB1hC752LEJPncKfLK/9n3Hpu6vt6waRFqgHLrkoaysdsh6lReXiukIXfJQVlY7ZL3Ki0vF1NAlD2VltUPWq7y4VEwNXfJQVlY7ZL3Ki0vF1NAlD2VltUPWq7y4VEwNXfJQVlY7ZL3Ki0vFlEMXEUlIoRy6mS00s++a2W4z+6mZXdJkzOvNbL+ZPVj/0t+YqUsxT628ePm03aLWSg79WeDD7n6/mR0NbDezb7n7roZx33f3VZ0vUbouxTy18uLl03aL3pxH6O7+mLvfX//5d8BuYLDswqRCKeaplRcvn7Zb9IIuiprZImAZsK3J22eY2U/M7C4zO3mG37/QzEbMbGR8fDy4WOmSFPPUyouXT9stei03dDN7PvB14EPu/lTD2/cDf+nurwK+AGxptg53v9bdh919eGBgoN2apWwp5qmVFy+ftlv0WmroZtZLrZnf7O63Nr7v7k+5+9P1n+8Ees3s+I5WKt2TYp5aefHyabtFr5WUiwE3ALvdvemDnc3sRfVxmNlp9fXu7WSh0kUp5qmVFy+ftlv05syhm9lfAd8HdgJ/qi/+GPASAHf/kpl9EPgAtUTMAeBSd/+v2darHLqISLjZcuhzxhbd/QeAzTHmGuCa9sqTtu3YVEsY7B+tncc864r5fbR0x6Ww/cu1D2W2ntpHvxX9tCCRhOh56KlSJvhId1wKIzccfu0Th1+rqcs8oWe5pEqZ4CNt/3LYcpEMqaGnSpngI/lE2HKRDKmhp0qZ4CNZT9hykQypoadKmeAjnboubLlIhtTQU6VM8JFWfRaGLzh8RG49tde6ICrziJ6HLiKSkEI59PlkywNjXHXPQzy67wAn9Pdx2YrFrFmW0YMlc8+t5z6/GGgbR00NvW7LA2NcfutODhyspSLG9h3g8lt3AuTR1HPPrec+vxhoG0dP59DrrrrnoUPNfNKBgxNcdc9DFVXUYbnn1nOfXwy0jaOnhl736L4DQcuTk3tuPff5xUDbOHpq6HUn9PcFLU9O7rn13OcXA23j6Kmh1122YjF9vUfehNLX28NlKxZXVFGH5Z5bz31+MdA2jp4uitZNXvjMNuUyedEq14RC7vOLgbZx9JRDFxFJyGw5dJ1yEUnBjk3wuVPgk/217zs2pbFu6SqdchGJXZn5b2XLs6IjdJHYlZn/VrY8K2roIrErM/+tbHlW1NBFYldm/lvZ8qyooYvErsz8t7LlWVFDF4ldmc++13P1s6IcuohIQpRDFxGZB9TQRUQyoYYuIpIJNXQRkUyooYuIZEINXUQkE2roIiKZUEMXEcnEnA3dzBaa2XfNbLeZ/dTMLmkyxszsajN72Mx2mNlryilXCtFzr0Wy1srz0J8FPuzu95vZ0cB2M/uWu++aMubNwMvrX6cDX6x/l1joudci2ZvzCN3dH3P3++s//w7YDTR+0ObbgK94zX1Av5m9uOPVSvv03GuR7AWdQzezRcAyYFvDW4PAnimvR5ne9DGzC81sxMxGxsfHwyqVYvTca5HstdzQzez5wNeBD7n7U41vN/mVaU/9cvdr3X3Y3YcHBgbCKpVi9Nxrkey11NDNrJdaM7/Z3W9tMmQUWDjl9RDwaPHypGP03GuR7LWScjHgBmC3u392hmG3Ae+up11eB+x398c6WKcUpedei2SvlZTLcuDvgJ1m9mB92ceAlwC4+5eAO4GVwMPA74H3dr5UKWzpuWrgIhmbs6G7+w9ofo586hgHLupUUSIiEk53ioqIZEINXUQkE2roIiKZUEMXEcmEGrqISCbU0EVEMqGGLiKSCatFyCv4h83GgV9X8o/P7XjgiaqLKJHml66c5waaXyv+0t2bPgyrsoYeMzMbcffhqusoi+aXrpznBppfUTrlIiKSCTV0EZFMqKE3d23VBZRM80tXznMDza8QnUMXEcmEjtBFRDKhhi4ikol53dDNrMfMHjCzO5q8t87Mxs3swfrX31dRYxFm9oiZ7azXP9LkfTOzq83sYTPbYWavqaLOdrQwt9eb2f4p+y+pz9ozs34z22xmPzOz3WZ2RsP7ye47aGl+ye4/M1s8pe4HzewpM/tQw5hS9l8rn1iUs0uA3cALZnj/a+7+wS7WU4a/dveZbmR4M/Dy+tfpwBfr31Mx29wAvu/uq7pWTWf9C3C3u59jZn8OPLfh/dT33Vzzg0T3n7s/BLwaageNwBjwjYZhpey/eXuEbmZDwFuA66uupUJvA77iNfcB/Wb24qqLmu/M7AXAmdQ+yxd3/6O772sYluy+a3F+uTgL+IW7N94VX8r+m7cNHfg88BHgT7OMeXv9z6HNZrawS3V1kgP/aWbbzezCJu8PAnumvB6tL0vBXHMDOMPMfmJmd5nZyd0srqCXAuPAv9ZPCV5vZs9rGJPyvmtlfpDu/ptqLbCxyfJS9t+8bOhmtgp43N23zzLsdmCRuy8Fvg3c2JXiOmu5u7+G2p93F5nZmQ3vN/us2FRyrHPN7X5qz7x4FfAFYEu3CyzgKOA1wBfdfRnwf8BHG8akvO9amV/K+w+A+qmk1cC/N3u7ybLC+29eNnRgObDazB4BbgHeYGY3TR3g7nvd/Q/1l9cBp3a3xOLc/dH698epncM7rWHIKDD1L48h4NHuVFfMXHNz96fc/en6z3cCvWZ2fNcLbc8oMOru2+qvN1NrgI1jktx3tDC/xPffpDcD97v7/zZ5r5T9Ny8burtf7u5D7r6I2p9E97r7+VPHNJzPWk3t4mkyzOx5Znb05M/A3wL/3TDsNuDd9SvurwP2u/tjXS41WCtzM7MXmZnVfz6N2n/re7tdazvc/X+APWa2uL7oLGBXw7Ak9x20Nr+U998U59H8dAuUtP/me8rlCGa2Hhhx99uAi81sNfAs8CSwrsra2vAXwDfq/5s4Cvg3d7/bzP4RwN2/BNwJrAQeBn4PvLeiWkO1MrdzgA+Y2bPAAWCtp3Vb9D8BN9f/bP8l8N5M9t2kueaX9P4zs+cCfwP8w5Rlpe8/3fovIpKJeXnKRUQkR2roIiKZUEMXEcmEGrqISCbU0EVEMqGGLiKSCTV0EZFM/D/9eJgL33QQQgAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "plt.scatter(X[:50,0],X[:50,1], label='0')\n",
    "plt.scatter(X[50:,0],X[50:,1], label='1')\n",
    "plt.legend()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "----\n",
    "\n",
    "### AdaBoost in Python"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "class AdaBoost:\n",
    "    def __init__(self, n_estimators=50, learning_rate=1.0):\n",
    "        self.clf_num = n_estimators\n",
    "        self.learning_rate = learning_rate\n",
    "\n",
    "    def init_args(self, datasets, labels):\n",
    "\n",
    "        self.X = datasets\n",
    "        self.Y = labels\n",
    "        self.M, self.N = datasets.shape\n",
    "\n",
    "        # 弱分类器数目和集合\n",
    "        self.clf_sets = []\n",
    "\n",
    "        # 初始化weights\n",
    "        self.weights = [1.0 / self.M] * self.M\n",
    "\n",
    "        # G(x)系数 alpha\n",
    "        self.alpha = []\n",
    "\n",
    "    def _G(self, features, labels, weights):\n",
    "        m = len(features)\n",
    "        error = 100000.0  # 无穷大\n",
    "        best_v = 0.0\n",
    "        # 单维features\n",
    "        features_min = min(features)\n",
    "        features_max = max(features)\n",
    "        n_step = (features_max - features_min +\n",
    "                  self.learning_rate) // self.learning_rate\n",
    "        # print('n_step:{}'.format(n_step))\n",
    "        direct, compare_array = None, None\n",
    "        for i in range(1, int(n_step)):\n",
    "            v = features_min + self.learning_rate * i\n",
    "\n",
    "            if v not in features:\n",
    "                # 误分类计算\n",
    "                compare_array_positive = np.array(\n",
    "                    [1 if features[k] > v else -1 for k in range(m)])\n",
    "                weight_error_positive = sum([\n",
    "                    weights[k] for k in range(m)\n",
    "                    if compare_array_positive[k] != labels[k]\n",
    "                ])\n",
    "\n",
    "                compare_array_nagetive = np.array(\n",
    "                    [-1 if features[k] > v else 1 for k in range(m)])\n",
    "                weight_error_nagetive = sum([\n",
    "                    weights[k] for k in range(m)\n",
    "                    if compare_array_nagetive[k] != labels[k]\n",
    "                ])\n",
    "\n",
    "                if weight_error_positive < weight_error_nagetive:\n",
    "                    weight_error = weight_error_positive\n",
    "                    _compare_array = compare_array_positive\n",
    "                    direct = 'positive'\n",
    "                else:\n",
    "                    weight_error = weight_error_nagetive\n",
    "                    _compare_array = compare_array_nagetive\n",
    "                    direct = 'nagetive'\n",
    "\n",
    "                # print('v:{} error:{}'.format(v, weight_error))\n",
    "                if weight_error < error:\n",
    "                    error = weight_error\n",
    "                    compare_array = _compare_array\n",
    "                    best_v = v\n",
    "        return best_v, direct, error, compare_array\n",
    "\n",
    "    # 计算alpha\n",
    "    def _alpha(self, error):\n",
    "        return 0.5 * np.log((1 - error) / error)\n",
    "\n",
    "    # 规范化因子\n",
    "    def _Z(self, weights, a, clf):\n",
    "        return sum([\n",
    "            weights[i] * np.exp(-1 * a * self.Y[i] * clf[i])\n",
    "            for i in range(self.M)\n",
    "        ])\n",
    "\n",
    "    # 权值更新\n",
    "    def _w(self, a, clf, Z):\n",
    "        for i in range(self.M):\n",
    "            self.weights[i] = self.weights[i] * np.exp(\n",
    "                -1 * a * self.Y[i] * clf[i]) / Z\n",
    "\n",
    "    # G(x)的线性组合\n",
    "    def _f(self, alpha, clf_sets):\n",
    "        pass\n",
    "\n",
    "    def G(self, x, v, direct):\n",
    "        if direct == 'positive':\n",
    "            return 1 if x > v else -1\n",
    "        else:\n",
    "            return -1 if x > v else 1\n",
    "\n",
    "    def fit(self, X, y):\n",
    "        self.init_args(X, y)\n",
    "\n",
    "        for epoch in range(self.clf_num):\n",
    "            best_clf_error, best_v, clf_result = 100000, None, None\n",
    "            # 根据特征维度, 选择误差最小的\n",
    "            for j in range(self.N):\n",
    "                features = self.X[:, j]\n",
    "                # 分类阈值，分类误差，分类结果\n",
    "                v, direct, error, compare_array = self._G(\n",
    "                    features, self.Y, self.weights)\n",
    "\n",
    "                if error < best_clf_error:\n",
    "                    best_clf_error = error\n",
    "                    best_v = v\n",
    "                    final_direct = direct\n",
    "                    clf_result = compare_array\n",
    "                    axis = j\n",
    "\n",
    "                # print('epoch:{}/{} feature:{} error:{} v:{}'.format(epoch, self.clf_num, j, error, best_v))\n",
    "                if best_clf_error == 0:\n",
    "                    break\n",
    "\n",
    "            # 计算G(x)系数a\n",
    "            a = self._alpha(best_clf_error)\n",
    "            self.alpha.append(a)\n",
    "            # 记录分类器\n",
    "            self.clf_sets.append((axis, best_v, final_direct))\n",
    "            # 规范化因子\n",
    "            Z = self._Z(self.weights, a, clf_result)\n",
    "            # 权值更新\n",
    "            self._w(a, clf_result, Z)\n",
    "\n",
    "\n",
    "#             print('classifier:{}/{} error:{:.3f} v:{} direct:{} a:{:.5f}'.format(epoch+1, self.clf_num, error, best_v, final_direct, a))\n",
    "#             print('weight:{}'.format(self.weights))\n",
    "#             print('\\n')\n",
    "\n",
    "    def predict(self, feature):\n",
    "        result = 0.0\n",
    "        for i in range(len(self.clf_sets)):\n",
    "            axis, clf_v, direct = self.clf_sets[i]\n",
    "            f_input = feature[axis]\n",
    "            result += self.alpha[i] * self.G(f_input, clf_v, direct)\n",
    "        # sign\n",
    "        return 1 if result > 0 else -1\n",
    "\n",
    "    def score(self, X_test, y_test):\n",
    "        right_count = 0\n",
    "        for i in range(len(X_test)):\n",
    "            feature = X_test[i]\n",
    "            if self.predict(feature) == y_test[i]:\n",
    "                right_count += 1\n",
    "\n",
    "        return right_count / len(X_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 例8.1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "X = np.arange(10).reshape(10, 1)\n",
    "y = np.array([1, 1, 1, -1, -1, -1, 1, 1, 1, -1])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "clf = AdaBoost(n_estimators=3, learning_rate=0.5)\n",
    "clf.fit(X, y)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "X, y = create_data()\n",
    "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.5151515151515151"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "clf = AdaBoost(n_estimators=10, learning_rate=0.2)\n",
    "clf.fit(X_train, y_train)\n",
    "clf.score(X_test, y_test)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "average score:63.394%\n"
     ]
    }
   ],
   "source": [
    "# 100次结果\n",
    "result = []\n",
    "for i in range(1, 101):\n",
    "    X, y = create_data()\n",
    "    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33)\n",
    "    clf = AdaBoost(n_estimators=100, learning_rate=0.2)\n",
    "    clf.fit(X_train, y_train)\n",
    "    r = clf.score(X_test, y_test)\n",
    "    # print('{}/100 score：{}'.format(i, r))\n",
    "    result.append(r)\n",
    "\n",
    "print('average score:{:.3f}%'.format(sum(result)))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### scikit-learn实例"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "-----\n",
    "#### sklearn.ensemble.AdaBoostClassifier\n",
    "\n",
    "- algorithm：这个参数只有AdaBoostClassifier有。主要原因是scikit-learn实现了两种Adaboost分类算法，SAMME和SAMME.R。两者的主要区别是弱学习器权重的度量，SAMME使用了和我们的原理篇里二元分类Adaboost算法的扩展，即用对样本集分类效果作为弱学习器权重，而SAMME.R使用了对样本集分类的预测概率大小来作为弱学习器权重。由于SAMME.R使用了概率度量的连续值，迭代一般比SAMME快，因此AdaBoostClassifier的默认算法algorithm的值也是SAMME.R。我们一般使用默认的SAMME.R就够了，但是要注意的是使用了SAMME.R， 则弱分类学习器参数base_estimator必须限制使用支持概率预测的分类器。SAMME算法则没有这个限制。\n",
    "\n",
    "\n",
    "- n_estimators： AdaBoostClassifier和AdaBoostRegressor都有，就是我们的弱学习器的最大迭代次数，或者说最大的弱学习器的个数。一般来说n_estimators太小，容易欠拟合，n_estimators太大，又容易过拟合，一般选择一个适中的数值。默认是50。在实际调参的过程中，我们常常将n_estimators和下面介绍的参数learning_rate一起考虑。\n",
    "\n",
    "\n",
    "-  learning_rate:  AdaBoostClassifier和AdaBoostRegressor都有，即每个弱学习器的权重缩减系数ν\n",
    "\n",
    "\n",
    "- base_estimator：AdaBoostClassifier和AdaBoostRegressor都有，即我们的弱分类学习器或者弱回归学习器。理论上可以选择任何一个分类或者回归学习器，不过需要支持样本权重。我们常用的一般是CART决策树或者神经网络MLP。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "AdaBoostClassifier(learning_rate=0.5, n_estimators=100)"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from sklearn.ensemble import AdaBoostClassifier\n",
    "clf = AdaBoostClassifier(n_estimators=100, learning_rate=0.5)\n",
    "clf.fit(X_train, y_train)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.9090909090909091"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "clf.score(X_test, y_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 第8章提升方法-习题\n",
    "\n",
    "### 习题8.1\n",
    "&emsp;&emsp;某公司招聘职员考查身体、业务能力、发展潜力这3项。身体分为合格1、不合格0两级，业务能力和发展潜力分为上1、中2、下3三级分类为合格1 、不合格-1两类。已知10个人的数据，如下表所示。假设弱分类器为决策树桩。试用AdaBoost算法学习一个强分类器。  \n",
    "\n",
    "应聘人员情况数据表\n",
    "\n",
    "&emsp;&emsp;|1|2|3|4|5|6|7|8|9|10\n",
    "-|-|-|-|-|-|-|-|-|-|-\n",
    "身体|0|0|1|1|1|0|1|1|1|0\n",
    "业务|1|3|2|1|2|1|1|1|3|2\n",
    "潜力|3|1|2|3|3|2|2|1|1|1\n",
    "分类|-1|-1|-1|-1|-1|-1|1|1|-1|-1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "\n",
    "# 加载训练数据\n",
    "X = np.array([[0, 1, 3], [0, 3, 1], [1, 2, 2], [1, 1, 3], [1, 2, 3], [0, 1, 2],\n",
    "              [1, 1, 2], [1, 1, 1], [1, 3, 1], [0, 2, 1]])\n",
    "y = np.array([-1, -1, -1, -1, -1, -1, 1, 1, -1, -1])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**AdaBoostClassifier分类器实现：**\n",
    "\n",
    "采用sklearn的AdaBoostClassifier分类器直接求解，由于AdaBoostClassifier分类器默认采用CART决策树弱分类器，故不需要设置base_estimator参数。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "原始输出: [-1 -1 -1 -1 -1 -1  1  1 -1 -1]\n",
      "预测输出: [-1 -1 -1 -1 -1 -1  1  1 -1 -1]\n",
      "预测正确率：100.00%\n"
     ]
    }
   ],
   "source": [
    "from sklearn.ensemble import AdaBoostClassifier\n",
    "\n",
    "clf = AdaBoostClassifier()\n",
    "clf.fit(X, y)\n",
    "y_predict = clf.predict(X)\n",
    "score = clf.score(X, y)\n",
    "print(\"原始输出:\", y)\n",
    "print(\"预测输出:\", y_predict)\n",
    "print(\"预测正确率：{:.2%}\".format(score))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**自编程实现：**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 自编程求解例8.1\n",
    "import numpy as np\n",
    "\n",
    "\n",
    "class AdaBoost:\n",
    "    def __init__(self, X, y, tol=0.05, max_iter=10):\n",
    "        # 训练数据 实例\n",
    "        self.X = X\n",
    "        # 训练数据 标签\n",
    "        self.y = y\n",
    "        # 训练中止条件 right_rate>self.tol\n",
    "        self.tol = tol\n",
    "        # 最大迭代次数\n",
    "        self.max_iter = max_iter\n",
    "        # 初始化样本权重w\n",
    "        self.w = np.full((X.shape[0]), 1 / X.shape[0])\n",
    "        self.G = []  # 弱分类器\n",
    "\n",
    "    def build_stump(self):\n",
    "        \"\"\"\n",
    "        以带权重的分类误差最小为目标，选择最佳分类阈值\n",
    "        best_stump['dim'] 合适的特征所在维度\n",
    "        best_stump['thresh']  合适特征的阈值\n",
    "        best_stump['ineq']  树桩分类的标识lt,rt\n",
    "        \"\"\"\n",
    "        m, n = np.shape(self.X)\n",
    "        # 分类误差\n",
    "        e_min = np.inf\n",
    "        # 小于分类阈值的样本属于的标签类别\n",
    "        sign = None\n",
    "        # 最优分类树桩\n",
    "        best_stump = {}\n",
    "        for i in range(n):\n",
    "            range_min = self.X[:, i].min()  # 求每一种特征的最大最小值\n",
    "            range_max = self.X[:, i].max()\n",
    "            step_size = (range_max - range_min) / n\n",
    "            for j in range(-1, int(n) + 1):\n",
    "                thresh_val = range_min + j * step_size\n",
    "                # 计算左子树和右子树的误差\n",
    "                for inequal in ['lt', 'rt']:\n",
    "                    predict_vals = self.base_estimator(self.X, i, thresh_val,\n",
    "                                                       inequal)\n",
    "                    err_arr = np.array(np.ones(m))\n",
    "                    err_arr[predict_vals.T == self.y.T] = 0\n",
    "                    weighted_error = np.dot(self.w, err_arr)\n",
    "                    if weighted_error < e_min:\n",
    "                        e_min = weighted_error\n",
    "                        sign = predict_vals\n",
    "                        best_stump['dim'] = i\n",
    "                        best_stump['thresh'] = thresh_val\n",
    "                        best_stump['ineq'] = inequal\n",
    "        return best_stump, sign, e_min\n",
    "\n",
    "    def updata_w(self, alpha, predict):\n",
    "        \"\"\"\n",
    "        更新样本权重w\n",
    "        \"\"\"\n",
    "        # 以下2行根据公式8.4 8.5 更新样本权重\n",
    "        P = self.w * np.exp(-alpha * self.y * predict)\n",
    "        self.w = P / P.sum()\n",
    "\n",
    "    @staticmethod\n",
    "    def base_estimator(X, dimen, threshVal, threshIneq):\n",
    "        \"\"\"\n",
    "        计算单个弱分类器（决策树桩）预测输出\n",
    "        \"\"\"\n",
    "        ret_array = np.ones(np.shape(X)[0])  # 预测矩阵\n",
    "        # 左叶子 ，整个矩阵的样本进行比较赋值\n",
    "        if threshIneq == 'lt':\n",
    "            ret_array[X[:, dimen] <= threshVal] = -1.0\n",
    "        else:\n",
    "            ret_array[X[:, dimen] > threshVal] = -1.0\n",
    "        return ret_array\n",
    "\n",
    "    def fit(self):\n",
    "        \"\"\"\n",
    "        对训练数据进行学习\n",
    "        \"\"\"\n",
    "        G = 0\n",
    "        for i in range(self.max_iter):\n",
    "            best_stump, sign, error = self.build_stump()  # 获取当前迭代最佳分类阈值\n",
    "            alpha = 1 / 2 * np.log((1 - error) / error)  # 计算本轮弱分类器的系数\n",
    "            # 弱分类器权重\n",
    "            best_stump['alpha'] = alpha\n",
    "            # 保存弱分类器\n",
    "            self.G.append(best_stump)\n",
    "            # 以下3行计算当前总分类器（之前所有弱分类器加权和）分类效率\n",
    "            G += alpha * sign\n",
    "            y_predict = np.sign(G)\n",
    "            error_rate = np.sum(\n",
    "                np.abs(y_predict - self.y)) / 2 / self.y.shape[0]\n",
    "            if error_rate < self.tol:  # 满足中止条件 则跳出循环\n",
    "                print(\"迭代次数:\", i + 1)\n",
    "                break\n",
    "            else:\n",
    "                self.updata_w(alpha, y_predict)  # 若不满足，更新权重，继续迭代\n",
    "\n",
    "    def predict(self, X):\n",
    "        \"\"\"\n",
    "        对新数据进行预测\n",
    "        \"\"\"\n",
    "        m = np.shape(X)[0]\n",
    "        G = np.zeros(m)\n",
    "        for i in range(len(self.G)):\n",
    "            stump = self.G[i]\n",
    "            # 遍历每一个弱分类器，进行加权\n",
    "            _G = self.base_estimator(X, stump['dim'], stump['thresh'],\n",
    "                                     stump['ineq'])\n",
    "            alpha = stump['alpha']\n",
    "            G += alpha * _G\n",
    "        y_predict = np.sign(G)\n",
    "        return y_predict.astype(int)\n",
    "\n",
    "    def score(self, X, y):\n",
    "        \"\"\"对训练效果进行评价\"\"\"\n",
    "        y_predict = self.predict(X)\n",
    "        error_rate = np.sum(np.abs(y_predict - y)) / 2 / y.shape[0]\n",
    "        return 1 - error_rate"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "迭代次数: 8\n",
      "原始输出: [-1 -1 -1 -1 -1 -1  1  1 -1 -1]\n",
      "预测输出: [-1 -1 -1 -1 -1 -1  1  1 -1 -1]\n",
      "预测正确率：100.00%\n"
     ]
    }
   ],
   "source": [
    "clf = AdaBoost(X, y)\n",
    "clf.fit()\n",
    "y_predict = clf.predict(X)\n",
    "score = clf.score(X, y)\n",
    "print(\"原始输出:\", y)\n",
    "print(\"预测输出:\", y_predict)\n",
    "print(\"预测正确率：{:.2%}\".format(score))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 习题8.2\n",
    "&emsp;&emsp;比较支持向量机、 AdaBoost 、Logistic回归模型的学习策略与算法"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**解答：**  \n",
    "- **支持向量机**  \n",
    "学习策略：极小化正则化合页损失，软间隔最大化；  \n",
    "学习算法：序列最小最优化算法（SMO）  \n",
    "- **AdaBoost**  \n",
    "学习策略：极小化加法模型指数损失；  \n",
    "学习算法：前向分步加法算法  \n",
    "- **Logistic回归**  \n",
    "学习策略：极大似然估计，正则化的极大似然估计；  \n",
    "学习算法：改进的迭代尺度算法，梯度下降，拟牛顿法"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "----\n",
    "参考代码：https://github.com/wzyonggege/statistical-learning-method\n",
    "\n",
    "本文代码更新地址：https://github.com/fengdu78/lihang-code\n",
    "\n",
    "习题解答：https://github.com/datawhalechina/statistical-learning-method-solutions-manual\n",
    "\n",
    "中文注释制作：机器学习初学者公众号：ID:ai-start-com\n",
    "\n",
    "配置环境：python 3.5+\n",
    "\n",
    "代码全部测试通过。\n",
    "![gongzhong](../gongzhong.jpg)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
